Our project data consist of 9 diffirent products to be forecasted. In this homework we will try to develop some forecasting models with arima, seasonal arima and additional regressors. First of all we need to prepare our data and required libraries. Then we will follow the steps indicated in homework description.
Required Libraries:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
library(plotly)
## Warning: package 'plotly' was built under R version 4.0.5
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(forecast)
## Warning: package 'forecast' was built under R version 4.0.5
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(data.table)
## Warning: package 'data.table' was built under R version 4.0.5
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
Now it is time to read the data and make some manipulation on it to inspect all products particularly. We will use set.seed to block any randomness in splitting data to products.
data1 = read.csv("C:/Users/rtbol/Desktop/ProjectRawData.csv", header = TRUE)
data1$event_date = as.Date(data1$event_date)
data1 = data1[order(data1$event_date),]
set.seed(0)
split_list = split(data1, data1$product_content_id)
yuz_temizleyici = split_list[[1]]
islak_mendil = split_list[[2]]
kulaklik = split_list[[3]]
supurge = split_list[[4]]
tayt = split_list[[5]]
bikini327 = split_list[[6]]
dis_fircasi = split_list[[7]]
mont = split_list[[8]]
bikini733 = split_list[[9]]
From now on, we have all products one by one in their own data tables and we will continue with “yuz_temizleyici” in task1
ggplotly(ggplot(yuz_temizleyici, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Yüz Temizleyici Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
yuz_temizleyici = data.table(yuz_temizleyici)
This data is the best among all products because we have variables clearly. There are no zero values in sold_count which may lead some mistake.Now we will try to look the seasonality and trend of timeseries for weekly,monthly,quarterly periods. We will use additive type of decomposition for all of the models below. So if we want to deseasonalize or detrend the observations we will use substraction.
ts_weekly_yuztemizleyici = ts(yuz_temizleyici$sold_count, freq = 7)
dec_weekly_yuztemizleyici = decompose(ts_weekly_yuztemizleyici, type = "additive")
plot(dec_weekly_yuztemizleyici)
dec_weekly_yuztemizleyici$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [7] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [13] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [19] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [25] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [31] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [37] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [43] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [49] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [55] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [61] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [67] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [73] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [79] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [85] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [91] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [97] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [103] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [109] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [115] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [121] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [127] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [133] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [139] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [145] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [151] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [157] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [163] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [169] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [175] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [181] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [187] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [193] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [199] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [205] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [211] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [217] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [223] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [229] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [235] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [241] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [247] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [253] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [259] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [265] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [271] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [277] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [283] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [289] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [295] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [301] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [307] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [313] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [319] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [325] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
## [331] 11.564127 15.234457 8.184695 -5.933903 -13.825983 -12.515543
## [337] -2.707851 11.564127 15.234457 8.184695 -5.933903 -13.825983
## [343] -12.515543 -2.707851 11.564127 15.234457 8.184695 -5.933903
## [349] -13.825983 -12.515543 -2.707851 11.564127 15.234457 8.184695
## [355] -5.933903 -13.825983 -12.515543 -2.707851 11.564127 15.234457
## [361] 8.184695 -5.933903 -13.825983 -12.515543 -2.707851 11.564127
## [367] 15.234457 8.184695 -5.933903 -13.825983 -12.515543 -2.707851
In weekly decomposition, we have weekly seasonality and some choppy trend which tend to increase in long time. Peaks of the data may be because of discount days of Trendyol.
ts_monthly_yuztemizleyici = ts(yuz_temizleyici$sold_count, freq = 30)
dec_monthly_yuztemizleyici = decompose(ts_monthly_yuztemizleyici, type = "additive")
plot(dec_monthly_yuztemizleyici)
Monthly decomposition also shows that there is monthly seasonality and trend of this decomposition is increasing especially in the second half of data. Again effects of discount days can be seen clearly.
ts_quarterly_yuztemizleyici = ts(yuz_temizleyici$sold_count, freq = 90)
dec_quarterly_yuztemizleyici = decompose(ts_quarterly_yuztemizleyici, type = "additive")
plot(dec_quarterly_yuztemizleyici)
In quarterly ts, seasonal part is not clear but trend is increasing.
Now it is time to plot another product “islak_mendil”.
ggplotly(ggplot(islak_mendil, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Islak Mendil Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
islak_mendil = data.table(islak_mendil)
ts_weekly_islakmendil = ts(islak_mendil$sold_count, freq = 7)
dec_weekly_islakmendil = decompose(ts_weekly_islakmendil, type = "additive")
plot(dec_weekly_islakmendil)
dec_weekly_islakmendil$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [7] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [13] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [19] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [25] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [31] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [37] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [43] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [49] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [55] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [61] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [67] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [73] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [79] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [85] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [91] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [97] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [103] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [109] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [115] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [121] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [127] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [133] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [139] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [145] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [151] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [157] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [163] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [169] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [175] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [181] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [187] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [193] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [199] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [205] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [211] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [217] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [223] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [229] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [235] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [241] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [247] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [253] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [259] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [265] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [271] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [277] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [283] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [289] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [295] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [301] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [307] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [313] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [319] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [325] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
## [331] 111.59904 79.18970 62.66632 -76.28516 -135.34052 -93.69491
## [337] 51.86553 111.59904 79.18970 62.66632 -76.28516 -135.34052
## [343] -93.69491 51.86553 111.59904 79.18970 62.66632 -76.28516
## [349] -135.34052 -93.69491 51.86553 111.59904 79.18970 62.66632
## [355] -76.28516 -135.34052 -93.69491 51.86553 111.59904 79.18970
## [361] 62.66632 -76.28516 -135.34052 -93.69491 51.86553 111.59904
## [367] 79.18970 62.66632 -76.28516 -135.34052 -93.69491 51.86553
There is weekly seasonality in islak_mendil data. Trend of this decomposition is not increasing or decreasing but it highly effected by discount days.
ts_monthly_islakmendil = ts(islak_mendil$sold_count, freq = 30)
dec_monthly_islakmendil = decompose(ts_monthly_islakmendil, type = "additive")
plot(dec_monthly_islakmendil)
Monthly seasonality can be seen in the seasonal part of the decomposition. Trend is somehow increasing until discount days. After a few peaks in discount days trend again goes down and shows another increase in the last part of data.
ts_quarterly_islakmendil = ts(islak_mendil$sold_count, freq = 90)
dec_quarterly_islakmendil = decompose(ts_quarterly_islakmendil, type = "additive")
plot(dec_quarterly_islakmendil)
There is a quarterly seasonality in the data but again it is not clear. Trend of the data is increasing in the long run but there is some waves in it.
The next data is “kulaklik”. First we will plot the data to see general structure of data.
ggplotly(ggplot(kulaklik, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Bluetooth Kulaklık Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
kulaklik = data.table(kulaklik)
ts_weekly_kulaklik = ts(kulaklik$sold_count, freq = 7)
dec_weekly_kulaklik = decompose(ts_weekly_kulaklik , type = "additive")
plot(dec_weekly_kulaklik)
dec_weekly_kulaklik$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [7] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [13] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [19] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [25] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [31] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [37] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [43] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [49] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [55] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [61] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [67] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [73] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [79] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [85] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [91] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [97] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [103] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [109] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [115] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [121] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [127] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [133] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [139] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [145] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [151] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [157] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [163] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [169] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [175] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [181] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [187] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [193] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [199] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [205] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [211] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [217] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [223] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [229] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [235] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [241] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [247] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [253] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [259] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [265] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [271] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [277] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [283] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [289] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [295] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [301] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [307] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [313] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [319] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [325] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
## [331] 16.717381 22.121227 41.793060 -1.495350 -23.785366 -50.675476
## [337] -4.675476 16.717381 22.121227 41.793060 -1.495350 -23.785366
## [343] -50.675476 -4.675476 16.717381 22.121227 41.793060 -1.495350
## [349] -23.785366 -50.675476 -4.675476 16.717381 22.121227 41.793060
## [355] -1.495350 -23.785366 -50.675476 -4.675476 16.717381 22.121227
## [361] 41.793060 -1.495350 -23.785366 -50.675476 -4.675476 16.717381
## [367] 22.121227 41.793060 -1.495350 -23.785366 -50.675476 -4.675476
There is weekly seasonality and decreasing trend in the long run. However effects of the discount days block the continuous decrease of trend.
ts_monthly_kulaklik = ts(kulaklik$sold_count, freq = 30)
dec_monthly_kulaklik = decompose(ts_monthly_kulaklik, type = "additive")
plot(dec_monthly_kulaklik)
Monthly seasonality can be seen in seasonal part of decomposition. Trend is decreasing but it reaches the beginning level at the end of the data.
ts_quarterly_kulaklik = ts(kulaklik$sold_count, freq = 90)
dec_quarterly_kulaklik = decompose(ts_quarterly_kulaklik, type = "additive")
plot(dec_quarterly_kulaklik)
We can say that there is quarterly seasonality in data. In trend part, there is decreasing trend in the first half of the data and then increasing trend at the end.
Our next data is “supurge”. We can see from the plot that, it has not huge sale numbers except some peaks in the discount days.
ggplotly(ggplot(supurge, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Dik Süpürge Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
supurge = data.table(supurge)
ts_weekly_supurge = ts(supurge$sold_count, freq = 7)
dec_weekly_supurge = decompose(ts_weekly_supurge , type = "additive")
plot(dec_weekly_supurge)
dec_weekly_supurge$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [8] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [15] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [22] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [29] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [36] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [43] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [50] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [57] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [64] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [71] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [78] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [85] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [92] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [99] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [106] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [113] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [120] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [127] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [134] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [141] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [148] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [155] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [162] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [169] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [176] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [183] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [190] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [197] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [204] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [211] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [218] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [225] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [232] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [239] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [246] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [253] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [260] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [267] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [274] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [281] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [288] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [295] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [302] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [309] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [316] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [323] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [330] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [337] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [344] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [351] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [358] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [365] -1.755028 5.346620 8.099368 4.924891 -4.336564 -7.614918 -4.664369
## [372] -1.755028
Weekly seasonality exists just like previous products. Trend shows a little bit decreasing trend again exceptionals of discount days.
ts_monthly_supurge = ts(supurge$sold_count, freq = 30)
dec_monthly_supurge = decompose(ts_monthly_supurge, type = "additive")
plot(dec_monthly_supurge)
Our monthly seasonality is again similar to other products. There is an wavy trend plot in decomposition plots, it sometimes increases and sometimes decreases.
ts_quarterly_supurge = ts(supurge$sold_count, freq = 90)
dec_quarterly_supurge = decompose(ts_quarterly_supurge, type = "additive")
plot(dec_quarterly_supurge)
There is quarterly seasonality in the data but it is not clear and regular. Trend is increasing until discount days and then it is decreasing.
The next product is “tayt”. The effects of discount days, especially “9-10-11 Nov” is the most obvious one among other products in “tayt”.
ggplotly(ggplot(tayt, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Tayt Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
tayt = data.table(tayt)
We will look to data “tayt” in weekly, monthly and quarterly level. Decomposition of this data in different levels again show almost same features with previous products. Weekly and monthly seasonality is clear and quarterly seasonality have some outlier. Trend of the decomposition increases from mid of the data and the it decreases.
ts_weekly_tayt = ts(tayt$sold_count, freq = 7)
dec_weekly_tayt = decompose(ts_weekly_tayt , type = "additive")
plot(dec_weekly_tayt)
dec_weekly_tayt$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [7] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [13] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [19] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [25] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [31] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [37] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [43] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [49] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [55] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [61] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [67] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [73] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [79] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [85] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [91] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [97] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [103] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [109] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [115] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [121] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [127] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [133] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [139] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [145] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [151] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [157] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [163] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [169] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [175] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [181] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [187] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [193] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [199] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [205] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [211] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [217] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [223] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [229] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [235] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [241] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [247] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [253] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [259] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [265] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [271] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [277] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [283] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [289] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [295] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [301] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [307] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [313] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [319] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [325] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
## [331] 139.490714 244.237967 236.041979 -131.292253 -255.825220 -229.770275
## [337] -2.882912 139.490714 244.237967 236.041979 -131.292253 -255.825220
## [343] -229.770275 -2.882912 139.490714 244.237967 236.041979 -131.292253
## [349] -255.825220 -229.770275 -2.882912 139.490714 244.237967 236.041979
## [355] -131.292253 -255.825220 -229.770275 -2.882912 139.490714 244.237967
## [361] 236.041979 -131.292253 -255.825220 -229.770275 -2.882912 139.490714
## [367] 244.237967 236.041979 -131.292253 -255.825220 -229.770275 -2.882912
ts_monthly_tayt = ts(tayt$sold_count, freq = 30)
dec_monthly_tayt = decompose(ts_monthly_tayt, type = "additive")
plot(dec_monthly_tayt)
ts_quarterly_tayt = ts(tayt$sold_count, freq = 90)
dec_quarterly_tayt = decompose(ts_quarterly_tayt, type = "additive")
plot(dec_quarterly_tayt)
Now we will look at the first bikini data which is “bikini327”. It can be seen in the plot that this data is not appropriate for building models on them but we have to do. So we will continue with this data but most probably it will give us huge errors at the end.
ggplotly(ggplot(bikini327, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Bikini Üstü Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
bikini327 = data.table(bikini327)
ts_weekly_bikini327 = ts(bikini327$sold_count, freq = 7)
dec_weekly_bikini327 = decompose(ts_weekly_bikini327 , type = "additive")
plot(dec_weekly_bikini327)
dec_weekly_bikini327$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [7] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [13] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [19] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [25] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [31] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [37] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [43] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [49] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [55] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [61] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [67] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [73] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [79] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [85] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [91] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [97] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [103] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [109] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [115] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [121] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [127] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [133] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [139] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [145] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [151] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [157] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [163] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [169] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [175] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [181] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [187] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [193] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [199] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [205] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [211] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [217] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [223] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [229] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [235] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [241] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [247] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [253] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [259] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [265] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [271] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [277] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [283] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [289] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [295] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [301] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [307] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [313] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [319] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [325] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
## [331] 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231
## [337] -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853 -0.71305649
## [343] 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371 -0.27836853
## [349] -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472 0.09090371
## [355] -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769 0.49573472
## [361] 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341 0.71276769
## [367] 0.49573472 0.09090371 -0.27836853 -0.71305649 0.32815231 -0.63613341
ts_monthly_bikini327 = ts(bikini327$sold_count, freq = 30)
dec_monthly_bikini327 = decompose(ts_monthly_bikini327, type = "additive")
plot(dec_monthly_bikini327)
ts_quarterly_bikini327 = ts(bikini327$sold_count, freq = 90)
dec_quarterly_bikini327 = decompose(ts_quarterly_bikini327, type = "additive")
plot(dec_quarterly_bikini327)
Since we have a dirty data, it is hard to make comment on its trend values but we can say that there is some seasonality in all above decompositions.
Next data is “dis_fircasi” this data is better than previous bikini data. In general we can say that the sold_count is increasing.
ggplotly(ggplot(dis_fircasi, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Şarj Edilebilir Diş Fırçası Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
dis_fircasi = data.table(dis_fircasi)
ts_weekly_disfircasi = ts(dis_fircasi$sold_count, freq = 7)
dec_weekly_disfircasi = decompose(ts_weekly_disfircasi , type = "additive")
plot(dec_weekly_disfircasi)
dec_weekly_disfircasi$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [7] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [13] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [19] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [25] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [31] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [37] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [43] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [49] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [55] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [61] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [67] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [73] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [79] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [85] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [91] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [97] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [103] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [109] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [115] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [121] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [127] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [133] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [139] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [145] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [151] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [157] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [163] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [169] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [175] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [181] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [187] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [193] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [199] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [205] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [211] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [217] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [223] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [229] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [235] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [241] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [247] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [253] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [259] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [265] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [271] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [277] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [283] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [289] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [295] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [301] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [307] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [313] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [319] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [325] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
## [331] 12.062698 13.647863 2.889414 -8.738619 -15.423566 -7.799939
## [337] 3.362149 12.062698 13.647863 2.889414 -8.738619 -15.423566
## [343] -7.799939 3.362149 12.062698 13.647863 2.889414 -8.738619
## [349] -15.423566 -7.799939 3.362149 12.062698 13.647863 2.889414
## [355] -8.738619 -15.423566 -7.799939 3.362149 12.062698 13.647863
## [361] 2.889414 -8.738619 -15.423566 -7.799939 3.362149 12.062698
## [367] 13.647863 2.889414 -8.738619 -15.423566 -7.799939 3.362149
ts_monthly_disfircasi = ts(dis_fircasi$sold_count, freq = 30)
dec_monthly_disfircasi = decompose(ts_monthly_disfircasi, type = "additive")
plot(dec_monthly_disfircasi)
ts_quarterly_disfircasi = ts(dis_fircasi$sold_count, freq = 90)
dec_quarterly_disfircasi = decompose(ts_quarterly_disfircasi, type = "additive")
plot(dec_quarterly_disfircasi)
In decomposition, the general increasing structure of the data can be seen all of the trend parts of decomposition in different levels. However, in quarterly decomposition seasonal part is not clear. We can say that there is weekly seasonality and monthly seasonality.
Next data will be one of the worst data which is “mont”. There is too many 0’s in data which will lead us to predict wrong/meaningless values. But we will do decomposition for same frequencies with other products.
ggplotly(ggplot(mont, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Mont Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
mont = data.table(mont)
ts_weekly_mont = ts(mont$sold_count, freq = 7)
dec_weekly_mont = decompose(ts_weekly_mont , type = "additive")
plot(dec_weekly_mont)
dec_weekly_mont$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [7] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [13] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [19] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [25] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [31] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [37] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [43] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [49] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [55] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [61] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [67] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [73] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [79] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [85] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [91] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [97] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [103] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [109] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [115] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [121] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [127] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [133] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [139] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [145] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [151] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [157] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [163] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [169] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [175] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [181] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [187] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [193] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [199] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [205] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [211] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [217] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [223] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [229] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [235] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [241] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [247] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [253] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [259] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [265] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [271] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [277] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [283] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [289] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [295] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [301] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [307] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [313] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [319] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [325] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
## [331] -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541
## [337] -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028 -0.4227881
## [343] -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390 -0.1506028
## [349] -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866 0.1189390
## [355] -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793 1.0524866
## [361] 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013 -0.1315793
## [367] 1.0524866 0.1189390 -0.1506028 -0.4227881 -0.3568541 -0.1096013
ts_monthly_mont = ts(mont$sold_count, freq = 30)
dec_monthly_mont = decompose(ts_monthly_mont, type = "additive")
plot(dec_monthly_mont)
ts_quarterly_mont = ts(mont$sold_count, freq = 90)
dec_quarterly_mont = decompose(ts_quarterly_mont, type = "additive")
plot(dec_quarterly_mont)
We can see seasonalities for all the decompositions above. Trend only gives meaningful values when actual values are nonzero.
Next data is the second bikini data “bikini733”. Again this is also a dirty data only the last part of the data gives meaningful values. Old data is 0 always. This old data most probably will lead us to predict with huge errors.
ggplotly(ggplot(bikini733, aes(x=event_date,y=sold_count)) + geom_line() + ggtitle("Bikini Üstü Satış Grafiği") + xlab("Tarih") + ylab("Satış Miktarı"))
bikini733 = data.table(bikini733)
ts_weekly_bikini733 = ts(bikini733$sold_count, freq = 7)
dec_weekly_bikini733 = decompose(ts_weekly_bikini733 , type = "additive")
plot(dec_weekly_bikini733)
dec_weekly_bikini733$seasonal
## Time Series:
## Start = c(1, 1)
## End = c(54, 1)
## Frequency = 7
## [1] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [7] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [13] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [19] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [25] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [31] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [37] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [43] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [49] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [55] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [61] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [67] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [73] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [79] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [85] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [91] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [97] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [103] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [109] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [115] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [121] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [127] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [133] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [139] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [145] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [151] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [157] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [163] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [169] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [175] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [181] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [187] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [193] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [199] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [205] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [211] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [217] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [223] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [229] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [235] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [241] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [247] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [253] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [259] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [265] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [271] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [277] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [283] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [289] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [295] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [301] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [307] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [313] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [319] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [325] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
## [331] -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966
## [337] -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891 2.9601241
## [343] 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853 -0.4766891
## [349] 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593 -0.4200853
## [355] -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056 0.1249593
## [361] -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001 -1.9602056
## [367] 0.1249593 -0.4200853 -0.4766891 2.9601241 1.9875966 -2.2157001
ts_monthly_bikini733 = ts(bikini733$sold_count, freq = 30)
dec_monthly_bikini733 = decompose(ts_monthly_bikini733, type = "additive")
plot(dec_monthly_bikini733)
ts_quarterly_bikini733 = ts(bikini733$sold_count, freq = 90)
dec_quarterly_bikini733 = decompose(ts_quarterly_bikini733, type = "additive")
plot(dec_quarterly_bikini733)
There is an increasing trend at the end of the data. Seasonality exists for weekly, monthly and quarterly levels.
We will use auto.arima function to deseasonalized and detrended (random) parts of the products for determining the best arima or seasonal arima models for our products. We have decided to use weekly decomposition for all products because weekly decomposition gave us better trend and seasonality in all of products. It will be easy to use same frequency in all models.
deseasonal_yuztemizleyici = ts_weekly_yuztemizleyici - dec_weekly_yuztemizleyici$seasonal
detrended_yuztemizleyici = deseasonal_yuztemizleyici - dec_weekly_yuztemizleyici$trend
arima_yuztemizleyici = auto.arima(detrended_yuztemizleyici)
arima_yuztemizleyici
## Series: detrended_yuztemizleyici
## ARIMA(4,0,0)(0,0,1)[7] with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 sma1
## 0.1755 -0.1073 -0.2948 -0.1908 -0.2228
## s.e. 0.0517 0.0500 0.0497 0.0553 0.0598
##
## sigma^2 estimated as 856.5: log likelihood=-1752.97
## AIC=3517.94 AICc=3518.18 BIC=3541.36
checkresiduals(arima_yuztemizleyici)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,0,0)(0,0,1)[7] with zero mean
## Q* = 36.249, df = 9, p-value = 3.58e-05
##
## Model df: 5. Total lags used: 14
This gives ARIMA(4,0,0)(0,0,1) model as a best with aic value 3517.94 The p value of Ljung-Box test is a small value however there is not much correlated lags in ACF graph of checkresiduals() function.
deseasonal_islakmendil = ts_weekly_islakmendil - dec_weekly_islakmendil$seasonal
detrended_islakmendil = deseasonal_islakmendil - dec_weekly_islakmendil$trend
arima_islakmendil = auto.arima(detrended_islakmendil)
arima_islakmendil
## Series: detrended_islakmendil
## ARIMA(0,0,1) with zero mean
##
## Coefficients:
## ma1
## 0.2775
## s.e. 0.0471
##
## sigma^2 estimated as 60370: log likelihood=-2533.38
## AIC=5070.76 AICc=5070.79 BIC=5078.56
checkresiduals(arima_islakmendil)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with zero mean
## Q* = 55.877, df = 13, p-value = 2.833e-07
##
## Model df: 1. Total lags used: 14
“islakmendil” data gives MA(1) model with 5070.76 aic value. This aic value can be a little bit high. Ljung-Box test gives small p value which is not good for us. ACF graph has one big outlier in lag 2. There is no more interesting thing.
deseasonal_kulaklik = ts_weekly_kulaklik - dec_weekly_kulaklik$seasonal
detrended_kulaklik = deseasonal_kulaklik - dec_weekly_kulaklik$trend
arima_kulaklik = auto.arima(detrended_kulaklik)
arima_kulaklik
## Series: detrended_kulaklik
## ARIMA(0,0,1) with zero mean
##
## Coefficients:
## ma1
## 0.1788
## s.e. 0.0501
##
## sigma^2 estimated as 14569: log likelihood=-2273.2
## AIC=4550.4 AICc=4550.43 BIC=4558.2
checkresiduals(arima_kulaklik)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with zero mean
## Q* = 64.982, df = 13, p-value = 6.643e-09
##
## Model df: 1. Total lags used: 14
For data “kulaklik” auto.arima function gives again MA(1) model but this time it has 4550.4 aic value. P value again not good but higher than previous model. Again same lag value with “islakmendil” is highly correlated. Other lags fit the appropriate interval in ACF.
deseasonal_supurge = ts_weekly_supurge - dec_weekly_supurge$seasonal
detrended_supurge = deseasonal_supurge - dec_weekly_supurge$trend
arima_supurge = auto.arima(detrended_supurge)
arima_supurge
## Series: detrended_supurge
## ARIMA(2,0,2)(0,0,1)[7] with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 mean
## 0.6851 -0.7537 -0.4387 0.6917 -0.0147 -0.0098
## s.e. 0.0914 0.0826 0.0884 0.1095 0.0594 1.1356
##
## sigma^2 estimated as 359.3: log likelihood=-1593.28
## AIC=3200.57 AICc=3200.88 BIC=3227.89
checkresiduals(arima_supurge)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,0,1)[7] with non-zero mean
## Q* = 40.299, df = 8, p-value = 2.818e-06
##
## Model df: 6. Total lags used: 14
In “supurge” data, this time we have ARIMA(2,0,2)(0,0,1) model and 3200.57 aic value. P value is low this time and there are some lags which are highly correlated in ACF.
deseasonal_tayt = ts_weekly_tayt - dec_weekly_tayt$seasonal
detrended_tayt = deseasonal_tayt - dec_weekly_tayt$trend
arima_tayt = auto.arima(detrended_tayt)
arima_tayt
## Series: detrended_tayt
## ARIMA(0,0,2) with non-zero mean
##
## Coefficients:
## ma1 ma2 mean
## 0.3960 0.1069 -0.0568
## s.e. 0.0529 0.0661 42.9767
##
## sigma^2 estimated as 302410: log likelihood=-2827.28
## AIC=5662.56 AICc=5662.67 BIC=5678.17
checkresiduals(arima_tayt)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2) with non-zero mean
## Q* = 65.779, df = 11, p-value = 7.688e-10
##
## Model df: 3. Total lags used: 14
MA(2) model is the best model for data “tayt” according to auto.arima function. 5662.56 aic value is a little bit high. Some of the lags are highly correlated in ACF and not following normal distribution. High peak of the data in discount days may be the reason behind that.
deseasonal_bikini327 = ts_weekly_bikini327 - dec_weekly_bikini327$seasonal
detrended_bikini327 = deseasonal_bikini327 - dec_weekly_bikini327$trend
arima_bikini327 = auto.arima(detrended_bikini327)
arima_bikini327
## Series: detrended_bikini327
## ARIMA(0,0,1)(0,0,1)[7] with zero mean
##
## Coefficients:
## ma1 sma1
## 0.2940 -0.0491
## s.e. 0.0494 0.0584
##
## sigma^2 estimated as 30.29: log likelihood=-1142.54
## AIC=2291.07 AICc=2291.14 BIC=2302.78
checkresiduals(arima_bikini327)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,0,1)[7] with zero mean
## Q* = 116, df = 12, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 14
ARIMA(0,0,1)(0,0,1) is the model given by auto.arima function. Model has 2291.07 aic value. Since this data is not a good data because of too many 0 values, we can see in the ACF graph that some lags are highly correlated.
deseasonal_disfircasi = ts_weekly_disfircasi - dec_weekly_disfircasi$seasonal
detrended_disfircasi = deseasonal_disfircasi - dec_weekly_disfircasi$trend
arima_disfircasi = auto.arima(detrended_disfircasi)
arima_disfircasi
## Series: detrended_disfircasi
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.3480 0.1358
## s.e. 0.0487 2.3324
##
## sigma^2 estimated as 1103: log likelihood=-1800.51
## AIC=3607.01 AICc=3607.08 BIC=3618.72
checkresiduals(arima_disfircasi)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 57.431, df = 12, p-value = 6.607e-08
##
## Model df: 2. Total lags used: 14
auto.arima function give MA(1) model for the data “dis_fircasi”. Its aic value is 3607.01. Especially lags 2 and 3 are highly correlated in ACF graph.
deseasonal_mont = ts_weekly_mont - dec_weekly_mont$seasonal
detrended_mont = deseasonal_mont - dec_weekly_mont$trend
arima_mont = auto.arima(detrended_mont)
arima_mont
## Series: detrended_mont
## ARIMA(5,0,0) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5
## -0.1857 -0.4418 -0.3930 -0.3343 -0.1499
## s.e. 0.0516 0.0495 0.0504 0.0493 0.0514
##
## sigma^2 estimated as 5.34: log likelihood=-823.93
## AIC=1659.86 AICc=1660.09 BIC=1683.27
checkresiduals(arima_mont)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,0) with zero mean
## Q* = 53.917, df = 9, p-value = 1.958e-08
##
## Model df: 5. Total lags used: 14
Mont data was the worst data among our products and it gives AR(5) model as a best model. It has 1659.86 aic value. Lag 6 is the most correlated lag value in ACF. Data is not following normal distribution.
deseasonal_bikini733 = ts_weekly_bikini733 - dec_weekly_bikini733$seasonal
detrended_bikini733 = deseasonal_bikini733 - dec_weekly_bikini733$trend
arima_bikini733 = auto.arima(detrended_bikini733)
arima_bikini733
## Series: detrended_bikini733
## ARIMA(2,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ma1
## 0.8678 -0.3955 -0.9724
## s.e. 0.0489 0.0487 0.0148
##
## sigma^2 estimated as 59.4: log likelihood=-1266.48
## AIC=2540.97 AICc=2541.08 BIC=2556.58
checkresiduals(arima_bikini733)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1) with zero mean
## Q* = 43.29, df = 11, p-value = 9.67e-06
##
## Model df: 3. Total lags used: 14
Last data is “bikini733” and this time we have ARIMA(2,0,1).AIC value is 2540.97. This data was also bad with too many empty values.
In this part we will use the models we used during project. We had found the best regression models with their R-squared values and significancy of its regressors. In this part we check significancy and most of the regressors were highly significant.
temizleyici_reg = lm(sold_count~price+basket_count, yuz_temizleyici)
summary(temizleyici_reg)
##
## Call:
## lm(formula = sold_count ~ price + basket_count, data = yuz_temizleyici)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.272 -12.944 -1.359 8.750 234.132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 348.110025 23.519827 14.80 <2e-16 ***
## price -4.542409 0.304987 -14.89 <2e-16 ***
## basket_count 0.222676 0.006096 36.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.71 on 369 degrees of freedom
## Multiple R-squared: 0.7948, Adjusted R-squared: 0.7937
## F-statistic: 714.7 on 2 and 369 DF, p-value: < 2.2e-16
checkresiduals(temizleyici_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 86.388, df = 10, p-value = 2.777e-14
Adjusted R-squared = 0.7937 and both of the regressors are too significant. First 2 lags of the model are highly correlated in ACF. In these models we will use in homework do not include discount variable. So these correlations can be considered as normal.
islakmendil_reg = lm(sold_count~basket_count+category_sold+category_visits+category_favored, islak_mendil)
summary(islakmendil_reg)
##
## Call:
## lm(formula = sold_count ~ basket_count + category_sold + category_visits +
## category_favored, data = islak_mendil)
##
## Residuals:
## Min 1Q Median 3Q Max
## -524.87 -36.53 10.21 38.78 928.92
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.06721 9.82354 0.821 0.412
## basket_count 0.26065 0.01137 22.920 <2e-16 ***
## category_sold 0.22682 0.00819 27.696 <2e-16 ***
## category_visits -0.01015 0.00102 -9.953 <2e-16 ***
## category_favored -0.02442 0.00200 -12.212 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 108.6 on 367 degrees of freedom
## Multiple R-squared: 0.9346, Adjusted R-squared: 0.9339
## F-statistic: 1311 on 4 and 367 DF, p-value: < 2.2e-16
checkresiduals(islakmendil_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 82.63, df = 10, p-value = 1.529e-13
Adjusted R_squared = 0.9339 which is too good. All of the regressors were too significant. The first 3 lags of the model are highly correlated in ACF.
kulaklik_reg = lm(sold_count~visit_count+basket_count+category_sold+category_visits, kulaklik)
summary(kulaklik_reg)
##
## Call:
## lm(formula = sold_count ~ visit_count + basket_count + category_sold +
## category_visits, data = kulaklik)
##
## Residuals:
## Min 1Q Median 3Q Max
## -515.18 -50.78 1.72 58.08 233.50
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.106e+01 1.183e+01 -5.163 4e-07 ***
## visit_count -4.464e-03 4.889e-04 -9.132 <2e-16 ***
## basket_count 1.879e-01 8.362e-03 22.468 <2e-16 ***
## category_sold 2.534e-01 1.970e-02 12.861 <2e-16 ***
## category_visits -3.215e-03 3.202e-04 -10.039 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 89.2 on 367 degrees of freedom
## Multiple R-squared: 0.8345, Adjusted R-squared: 0.8327
## F-statistic: 462.8 on 4 and 367 DF, p-value: < 2.2e-16
checkresiduals(kulaklik_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 183.03, df = 10, p-value < 2.2e-16
Adjusted R-squared = 0.8327 which is good actually. count part of the checkresiduals function can be considered as follows normal distribution with some exceptionals. However ACF function shows that all lag values are highly correlated.
supurge_reg = lm(sold_count~favored_count+basket_count+category_sold+category_visits+ty_visits, supurge)
summary(supurge_reg)
##
## Call:
## lm(formula = sold_count ~ favored_count + basket_count + category_sold +
## category_visits + ty_visits, data = supurge)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.315 -5.497 1.338 6.128 70.688
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.483e+00 1.200e+00 -3.737 0.000216 ***
## favored_count -2.867e-02 6.055e-03 -4.734 3.15e-06 ***
## basket_count 1.494e-01 8.261e-03 18.084 < 2e-16 ***
## category_sold 1.731e-01 7.280e-03 23.770 < 2e-16 ***
## category_visits -1.323e-03 9.655e-05 -13.705 < 2e-16 ***
## ty_visits -8.174e-08 1.413e-08 -5.784 1.57e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.06 on 366 degrees of freedom
## Multiple R-squared: 0.9066, Adjusted R-squared: 0.9053
## F-statistic: 710.1 on 5 and 366 DF, p-value: < 2.2e-16
checkresiduals(supurge_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 89.418, df = 10, p-value = 6.986e-15
Adjusted R-squared = 0.9053 and all of the regressors are too significant. These outputs are too good to be used in prediction. However our ACF for residuals shows high correlation for too many lags.
tayt_reg = lm(sold_count~price+visit_count+basket_count+category_sold+category_visits, tayt)
summary(tayt_reg)
##
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count +
## category_sold + category_visits, data = tayt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -850.99 -90.91 19.94 87.44 2130.71
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.881e+02 1.442e+02 3.385 0.000788 ***
## price -1.206e+01 2.873e+00 -4.200 3.36e-05 ***
## visit_count -9.113e-03 8.952e-04 -10.179 < 2e-16 ***
## basket_count 4.781e-02 7.583e-03 6.304 8.35e-10 ***
## category_sold 5.114e-01 1.455e-02 35.155 < 2e-16 ***
## category_visits -5.409e-03 3.116e-04 -17.357 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 264.7 on 366 degrees of freedom
## Multiple R-squared: 0.9428, Adjusted R-squared: 0.942
## F-statistic: 1206 on 5 and 366 DF, p-value: < 2.2e-16
checkresiduals(tayt_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 201.41, df = 10, p-value < 2.2e-16
Adjusted R-squared = 0.942 and all of the regressors are again too significant. In acf first 3 lags have high correlation. Even so we thought that this model is good enough.
bikini327_reg = lm(sold_count~basket_count, bikini327)
summary(bikini327_reg)
##
## Call:
## lm(formula = sold_count ~ basket_count, data = bikini327)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.9509 -0.5322 -0.5322 -0.5322 22.8394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.53215 0.32868 1.619 0.106
## basket_count 0.16181 0.00276 58.624 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.318 on 370 degrees of freedom
## Multiple R-squared: 0.9028, Adjusted R-squared: 0.9025
## F-statistic: 3437 on 1 and 370 DF, p-value: < 2.2e-16
checkresiduals(bikini327_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 131.51, df = 10, p-value < 2.2e-16
This time our Adjusted R-squared = 0.9025 and regressor of the model is too significant. However data is not good with too many 0’s. Effect of this can be seen in the outputs of the checkresiduals function. We can say that these zeros will lead us to make wrong predictions.
disfircasi_reg = lm(sold_count~visit_count+favored_count+basket_count+ty_visits,dis_fircasi)
summary(disfircasi_reg)
##
## Call:
## lm(formula = sold_count ~ visit_count + favored_count + basket_count +
## ty_visits, data = dis_fircasi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -149.982 -10.183 -0.171 7.197 103.341
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.127e+00 2.356e+00 -2.177 0.0302 *
## visit_count -1.084e-02 1.727e-03 -6.275 9.87e-10 ***
## favored_count -3.096e-02 6.197e-03 -4.996 9.05e-07 ***
## basket_count 2.956e-01 9.820e-03 30.098 < 2e-16 ***
## ty_visits 3.794e-07 6.819e-08 5.564 5.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.56 on 367 degrees of freedom
## Multiple R-squared: 0.9222, Adjusted R-squared: 0.9214
## F-statistic: 1088 on 4 and 367 DF, p-value: < 2.2e-16
checkresiduals(disfircasi_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 126.14, df = 10, p-value < 2.2e-16
Adjusted R-squared = 0.9214 and all of the regressors are too significant. Lags in the ACF shows high correlation until lag 13.
Now since the mont model in our project is a little bit different, we will try to check all possible regressors and take the most significant ones in the second model. Adjusted R-squared = 0.8418 in the second model and both of the regressors are too significant however lags show too high correlation except some of them. Since the data is not good these can be considered normal again.
mont_reg = lm(sold_count~.,mont)
summary(mont_reg)
##
## Call:
## lm(formula = sold_count ~ ., data = mont)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5395 -1.3992 -0.2107 0.9868 7.2246
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.614e+02 3.482e+02 1.613 0.1124
## event_date -2.982e-02 1.867e-02 -1.597 0.1157
## product_content_id NA NA NA NA
## price -6.906e-03 3.521e-03 -1.961 0.0547 .
## visit_count 4.802e-03 1.985e-02 0.242 0.8097
## basket_count 1.956e-01 1.016e-02 19.262 < 2e-16 ***
## favored_count -5.581e-02 1.692e-01 -0.330 0.7427
## category_sold 1.546e-03 1.079e-03 1.433 0.1573
## category_visits -8.157e-06 1.163e-05 -0.701 0.4859
## category_basket -6.823e-07 1.976e-06 -0.345 0.7312
## category_favored -6.327e-05 1.191e-05 -5.313 1.86e-06 ***
## category_brand_sold 1.732e-05 9.207e-06 1.881 0.0651 .
## ty_visits 1.511e-08 2.558e-08 0.591 0.5570
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.577 on 57 degrees of freedom
## (303 observations deleted due to missingness)
## Multiple R-squared: 0.8921, Adjusted R-squared: 0.8713
## F-statistic: 42.85 on 11 and 57 DF, p-value: < 2.2e-16
checkresiduals(mont_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 16
##
## data: Residuals
## LM test = 18.784, df = 16, p-value = 0.28
mont_reg2 = lm(sold_count~basket_count+category_favored, mont)
summary(mont_reg2)
##
## Call:
## lm(formula = sold_count ~ basket_count + category_favored, data = mont)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3295 -0.4439 -0.1350 0.4664 8.3138
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.248e-01 1.050e-01 5.000 8.89e-07 ***
## basket_count 1.837e-01 4.187e-03 43.868 < 2e-16 ***
## category_favored -1.930e-05 2.282e-06 -8.458 6.44e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.406 on 369 degrees of freedom
## Multiple R-squared: 0.8427, Adjusted R-squared: 0.8418
## F-statistic: 988.1 on 2 and 369 DF, p-value: < 2.2e-16
checkresiduals(mont_reg2)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 82.254, df = 10, p-value = 1.812e-13
bikini733_reg = lm(sold_count~basket_count+category_sold,bikini733)
summary(bikini733_reg)
##
## Call:
## lm(formula = sold_count ~ basket_count + category_sold, data = bikini733)
##
## Residuals:
## Min 1Q Median 3Q Max
## -69.876 -1.774 0.799 1.697 39.304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.9360948 0.6314765 -3.066 0.00233 **
## basket_count 0.1966363 0.0031752 61.928 < 2e-16 ***
## category_sold 0.0028759 0.0006391 4.500 9.11e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.97 on 369 degrees of freedom
## Multiple R-squared: 0.9634, Adjusted R-squared: 0.9632
## F-statistic: 4860 on 2 and 369 DF, p-value: < 2.2e-16
checkresiduals(bikini733_reg)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 205.81, df = 10, p-value < 2.2e-16
This is our last regression model for “bikini733” data. Both of the regressors are too significant and adjusted r-squared = 0.9632. We do not need to comment on checkresiduals function because data is built with too many 0’s until the last part.
In this part we will use the best arima models for each products which are found with auto.arima() function in the previous part. Then we will add these models the most significant regressors we used in the regression models. First we will prepare test data for each of the products. We will use these test sets for comparing models arima and arima+regressors.
yuz_temizleyici_test = tail(yuz_temizleyici,7)
islak_mendil_test = tail(islak_mendil,7)
kulaklik_test = tail(kulaklik,7)
supurge_test = tail(supurge,7)
tayt_test = tail(tayt,7)
bikini327_test = tail(bikini327,7)
dis_fircasi_test = tail(dis_fircasi,7)
mont_test = tail(mont,7)
bikini733_test = tail(bikini733,7)
From now on, we will create variables outside of the model because of the structure of xreg parameter. Then we will add them into models with cbind().
price_temizleyici = yuz_temizleyici$price
basketcount_temizleyici = yuz_temizleyici$basket_count
sarimax_yuztemizleyici=arima(detrended_yuztemizleyici,order=c(4,0,0), seasonal = c(0,0,1), xreg= cbind(price_temizleyici,basketcount_temizleyici))
sarimax_yuztemizleyici
##
## Call:
## arima(x = detrended_yuztemizleyici, order = c(4, 0, 0), seasonal = c(0, 0, 1),
## xreg = cbind(price_temizleyici, basketcount_temizleyici))
##
## Coefficients:
## ar1 ar2 ar3 ar4 sma1 intercept price_temizleyici
## 0.4718 0.1354 -0.0896 0.2840 0.1201 17.9565 -0.9378
## s.e. 0.0536 0.0577 0.0587 0.0539 0.0678 48.2357 0.5756
## basketcount_temizleyici
## 0.1546
## s.e. 0.0133
##
## sigma^2 estimated as 622.3: log likelihood = -1697.21, aic = 3412.42
checkresiduals(sarimax_yuztemizleyici)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,0,0)(0,0,1)[7] with non-zero mean
## Q* = 32.682, df = 6, p-value = 1.207e-05
##
## Model df: 8. Total lags used: 14
New aic value is 3412.42 which is better than previous arima model without any regressors.
basketcount_mendil = islak_mendil$basket_count
categorysold_mendil = islak_mendil$category_sold
categoryvisits_mendil = islak_mendil$category_visits
categoryfavored_mendil = islak_mendil$category_favored
sarimax_islakmendil= arima(detrended_islakmendil,order=c(0,0,1),xreg= cbind(basketcount_mendil,categorysold_mendil,categoryvisits_mendil,categoryfavored_mendil))
sarimax_islakmendil
##
## Call:
## arima(x = detrended_islakmendil, order = c(0, 0, 1), xreg = cbind(basketcount_mendil,
## categorysold_mendil, categoryvisits_mendil, categoryfavored_mendil))
##
## Coefficients:
## ma1 intercept basketcount_mendil categorysold_mendil
## 0.4599 -186.6232 0.2269 -0.0296
## s.e. 0.0388 19.4802 0.0200 0.0305
## categoryvisits_mendil categoryfavored_mendil
## 0.1431 -0.0454
## s.e. 0.0185 0.0037
##
## sigma^2 estimated as 21369: log likelihood = -2343.9, aic = 4701.8
checkresiduals(sarimax_islakmendil)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 92.93, df = 8, p-value < 2.2e-16
##
## Model df: 6. Total lags used: 14
New aic value is 4701.8 this time and it is better than previous arima model for data “islak_mendil”.
visitcount_kulaklik = kulaklik$visit_count
basketcount_kulaklik = kulaklik$basket_count
categorysold_kulaklik = kulaklik$category_sold
categoryvisits_kulaklik = kulaklik$category_visits
sarimax_kulaklik= arima(detrended_kulaklik,order=c(0,0,1),xreg= cbind(visitcount_kulaklik,basketcount_kulaklik,categorysold_kulaklik,categoryvisits_kulaklik))
sarimax_kulaklik
##
## Call:
## arima(x = detrended_kulaklik, order = c(0, 0, 1), xreg = cbind(visitcount_kulaklik,
## basketcount_kulaklik, categorysold_kulaklik, categoryvisits_kulaklik))
##
## Coefficients:
## ma1 intercept visitcount_kulaklik basketcount_kulaklik
## 0.3456 -226.4632 -0.0032 0.0657
## s.e. 0.0474 17.7286 0.0006 0.0104
## categorysold_kulaklik categoryvisits_kulaklik
## 0.2232 -0.0074
## s.e. 0.0345 0.0039
##
## sigma^2 estimated as 8134: log likelihood = -2167.1, aic = 4348.2
checkresiduals(sarimax_kulaklik)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 85.554, df = 8, p-value = 3.664e-15
##
## Model df: 6. Total lags used: 14
Aic value is 4348.2 which is slightly better than previous model for data “kulaklik”.
favoredcount_supurge = supurge$favored_count
basketcount_supurge = supurge$basket_count
categorysold_supurge = supurge$category_sold
categoryvisits_supurge = supurge$category_visits
sarimax_supurge=arima(detrended_supurge,order=c(2,0,2), seasonal = c(0,0,1), xreg= cbind(favoredcount_supurge,basketcount_supurge,categorysold_supurge,categoryvisits_supurge))
sarimax_supurge
##
## Call:
## arima(x = detrended_supurge, order = c(2, 0, 2), seasonal = c(0, 0, 1), xreg = cbind(favoredcount_supurge,
## basketcount_supurge, categorysold_supurge, categoryvisits_supurge))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 intercept favoredcount_supurge
## 0.1667 0.5097 0.5783 0.1946 0.0538 -40.2289 0.0260
## s.e. 0.1198 0.1092 0.1200 0.0767 0.0490 4.2888 0.0102
## basketcount_supurge categorysold_supurge categoryvisits_supurge
## 0.0381 0.2465 -0.0122
## s.e. 0.0109 0.0162 0.0036
##
## sigma^2 estimated as 160.2: log likelihood = -1449.02, aic = 2920.04
checkresiduals(sarimax_supurge)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,0,1)[7] with non-zero mean
## Q* = 12.421, df = 4, p-value = 0.01448
##
## Model df: 10. Total lags used: 14
2920.04 is our new aic value for data “supurge”. Arima model without any regressors had 3200 aic value which is worse than new model with regressors.
price_tayt = tayt$price
visitcount_tayt = tayt$visit_count
basketcount_tayt = tayt$basket_count
categorysold_tayt = tayt$category_sold
categoryvisits_tayt = tayt$category_visits
sarimax_tayt=arima(detrended_tayt,order=c(0,0,2), xreg= cbind(price_tayt,visitcount_tayt,basketcount_tayt,categorysold_tayt,categoryvisits_tayt))
sarimax_tayt
##
## Call:
## arima(x = detrended_tayt, order = c(0, 0, 2), xreg = cbind(price_tayt, visitcount_tayt,
## basketcount_tayt, categorysold_tayt, categoryvisits_tayt))
##
## Coefficients:
## ma1 ma2 intercept price_tayt visitcount_tayt basketcount_tayt
## 0.9853 0.5834 -612.8662 -1.2176 -0.0026 -0.0458
## s.e. 0.0619 0.0503 369.9107 7.7129 0.0020 0.0092
## categorysold_tayt categoryvisits_tayt
## 0.5130 -0.0030
## s.e. 0.0387 0.0113
##
## sigma^2 estimated as 92832: log likelihood = -2613.25, aic = 5244.49
checkresiduals(sarimax_tayt)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2) with non-zero mean
## Q* = 214.62, df = 6, p-value < 2.2e-16
##
## Model df: 8. Total lags used: 14
New aic value is 5244.49 this is also better than previous one.
basketcount_bikini327 = bikini327$basket_count
sarimax_bikini327=arima(detrended_bikini327,order=c(0,0,1), seasonal = c(0,0,1), xreg= basketcount_bikini327)
sarimax_bikini327
##
## Call:
## arima(x = detrended_bikini327, order = c(0, 0, 1), seasonal = c(0, 0, 1), xreg = basketcount_bikini327)
##
## Coefficients:
## ma1 sma1 intercept basketcount_bikini327
## 0.3217 0.0121 -1.1998 0.0186
## s.e. 0.0484 0.0579 0.4552 0.0042
##
## sigma^2 estimated as 28.29: log likelihood = -1131.06, aic = 2272.13
checkresiduals(sarimax_bikini327)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,0,1)[7] with non-zero mean
## Q* = 116.13, df = 10, p-value < 2.2e-16
##
## Model df: 4. Total lags used: 14
Even in the bad data, our aic value slightly decreased and become better (From 2291.07 to 2272.13)
visitcount_firca = dis_fircasi$visit_count
favoredcount_firca = dis_fircasi$favored_count
basketcount_firca = dis_fircasi$basket_count
sarimax_firca=arima(detrended_disfircasi,order=c(0,0,1), xreg= cbind(visitcount_firca,favoredcount_firca,basketcount_firca))
sarimax_firca
##
## Call:
## arima(x = detrended_disfircasi, order = c(0, 0, 1), xreg = cbind(visitcount_firca,
## favoredcount_firca, basketcount_firca))
##
## Coefficients:
## ma1 intercept visitcount_firca favoredcount_firca
## 0.3724 -25.8785 -0.0114 -0.0202
## s.e. 0.0468 2.7804 0.0013 0.0072
## basketcount_firca
## 0.1461
## s.e. 0.0114
##
## sigma^2 estimated as 705.1: log likelihood = -1719.58, aic = 3451.16
checkresiduals(sarimax_firca)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 44.35, df = 9, p-value = 1.217e-06
##
## Model df: 5. Total lags used: 14
3451.16 is our new aic value which is sligthly better than previous 3607.01
basketcount_mont = mont$basket_count
sarimax_mont=arima(detrended_mont,order=c(5,0,0), xreg= basketcount_mont)
sarimax_mont
##
## Call:
## arima(x = detrended_mont, order = c(5, 0, 0), xreg = basketcount_mont)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 intercept basketcount_mont
## 0.4817 0.1209 0.0118 0.0316 0.1971 -0.8463 0.1604
## s.e. 0.0538 0.0655 0.0575 0.0596 0.0555 0.5293 0.0076
##
## sigma^2 estimated as 2.674: log likelihood = -699.84, aic = 1415.67
checkresiduals(sarimax_mont)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,0) with non-zero mean
## Q* = 40.667, df = 7, p-value = 9.379e-07
##
## Model df: 7. Total lags used: 14
Mont is the one of the worst data among our products and its aic value in this case(with regressors) is 1415.67 which is better than previous model without any regressors.
basketcount_bikini733 = bikini733$basket_count
categorysold_bikini733 = bikini733$category_sold
sarimax_bikini733 = arima(detrended_bikini733,order=c(2,0,1), xreg= cbind(basketcount_bikini733,categorysold_bikini733))
sarimax_bikini733
##
## Call:
## arima(x = detrended_bikini733, order = c(2, 0, 1), xreg = cbind(basketcount_bikini733,
## categorysold_bikini733))
##
## Coefficients:
## ar1 ar2 ma1 intercept basketcount_bikini733
## 0.8681 -0.3950 -1.0000 0.0133 -2e-04
## s.e. 0.0482 0.0481 0.0083 0.0133 3e-04
## categorysold_bikini733
## -1e-04
## s.e. 0e+00
##
## sigma^2 estimated as 57.33: log likelihood = -1262.98, aic = 2539.96
checkresiduals(sarimax_bikini733)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1) with non-zero mean
## Q* = 42.858, df = 8, p-value = 9.34e-07
##
## Model df: 6. Total lags used: 14
Last model is for bikini733 data which is a bad data again. Its aic value were 2540 before and now 2539.96 we can say there is not good improvement according to aic values of models.
In this part we will make predictions with predict functions for both models in each product. For arima models with regressors we need new_xreg parameter in predict function which corresponds to values of regressors in the test set. We will prepare variables out of the predict function and this time we will also prepare matrices from them again out of the predict function because of an error that “xreg and new_xreg have not the same column number”. We will create xreg_new1,2,3… variables for all products. After using predict function we will calculate last trend value and seasonal part corresponds to our test period (2:8) and add them into outcomes of predict funtion. The predicted values will be in appropriate form with seasonal and trend part now.
Then we will create last data tables which includes actual sold_count values in the actual column, predictions made by arima models under arimaprediction column and predictions made by arima models with regressors under sarimaxprediction column. We will use these last data tables when we compare results (performance) of models.
## YUZTEMİZLEYİCİ
pricetest_temizleyici = yuz_temizleyici_test$price
basketcounttest_temizleyici = yuz_temizleyici_test$basket_count
xreg_new = cbind(pricetest_temizleyici,basketcounttest_temizleyici)
sarimax_prediction = predict(sarimax_yuztemizleyici, newxreg = xreg_new)
arima_prediction = predict(arima_yuztemizleyici, newdata = yuz_temizleyici_test, n.ahead = 7)
last_trend_temizleyici = tail(dec_weekly_yuztemizleyici$trend[!is.na(dec_weekly_yuztemizleyici$trend)],1)
seasonality_temizleyici = dec_weekly_yuztemizleyici$seasonal[2:8]
predsarimax = sarimax_prediction$pred + last_trend_temizleyici + seasonality_temizleyici
predarima = arima_prediction$pred + last_trend_temizleyici + seasonality_temizleyici
lastyuztemizleyici = data.table()
lastyuztemizleyici = lastyuztemizleyici[,actual:=yuz_temizleyici_test$sold_count]
lastyuztemizleyici = lastyuztemizleyici[,arimaprediction := predarima]
lastyuztemizleyici = lastyuztemizleyici[,sarimaxprediction := predsarimax]
## ISLAKMENDIL
basketcounttest_mendil = islak_mendil_test$basket_count
categorysoldtest_mendil = islak_mendil_test$category_sold
categoryvisitstest_mendil = islak_mendil_test$category_visits
categoryfavoredtest_mendil = islak_mendil_test$category_favored
xreg_new2 = cbind(basketcounttest_mendil,categorysoldtest_mendil,categoryvisitstest_mendil,categoryfavoredtest_mendil)
sarimax_prediction2 = predict(sarimax_islakmendil, newxreg = xreg_new2)
arima_prediction2 = predict(arima_islakmendil, newdata = islak_mendil_test, n.ahead = 7)
last_trend_mendil = tail(dec_weekly_islakmendil$trend[!is.na(dec_weekly_islakmendil$trend)],1)
seasonality_mendil = dec_weekly_islakmendil$seasonal[2:8]
predsarimax2 = sarimax_prediction2$pred + last_trend_mendil + seasonality_mendil
predarima2 = arima_prediction2$pred + last_trend_mendil + seasonality_mendil
lastmendil = data.table()
lastmendil = lastmendil[,actual:=islak_mendil_test$sold_count]
lastmendil = lastmendil[,arimaprediction:=predarima2]
lastmendil = lastmendil[,sarimaxprediction:=predsarimax2]
## KULAKLIK
visitcounttest_kulaklik = kulaklik_test$visit_count
basketcounttest_kulaklik = kulaklik_test$basket_count
categorysoldtest_kulaklik = kulaklik_test$category_sold
categoryvisitstest_kulaklik = kulaklik_test$category_visits
xreg_new3 = cbind(visitcounttest_kulaklik,basketcounttest_kulaklik,categorysoldtest_kulaklik,categoryvisitstest_kulaklik)
sarimax_prediction3 = predict(sarimax_kulaklik, newxreg = xreg_new3)
arima_prediction3 = predict(arima_kulaklik, newdata = kulaklik_test, n.ahead = 7)
last_trend_kulaklik = tail(dec_weekly_kulaklik$trend[!is.na(dec_weekly_kulaklik$trend)],1)
seasonality_kulaklik = dec_weekly_kulaklik$seasonal[2:8]
predsarimax3 = sarimax_prediction3$pred + last_trend_kulaklik + seasonality_kulaklik
predarima3 = arima_prediction3$pred + last_trend_kulaklik + seasonality_kulaklik
lastkulaklik = data.table()
lastkulaklik = lastkulaklik[,actual:=kulaklik_test$sold_count]
lastkulaklik = lastkulaklik[,arimaprediction:=predarima3]
lastkulaklik = lastkulaklik[,sarimaxprediction:=predsarimax3]
## SUPURGE
favoredcounttest_supurge = supurge_test$favored_count
basketcounttest_supurge = supurge_test$basket_count
categorysoldtest_supurge = supurge_test$category_sold
categoryvisitstest_supurge = supurge_test$category_visits
xreg_new4 = cbind(favoredcounttest_supurge,basketcounttest_supurge,categorysoldtest_supurge,categoryvisitstest_supurge)
sarimax_prediction4 = predict(sarimax_supurge, newxreg = xreg_new4)
arima_prediction4 = predict(arima_supurge, newdata = supurge_test, n.ahead = 7)
last_trend_supurge = tail(dec_weekly_supurge$trend[!is.na(dec_weekly_supurge$trend)],1)
seasonality_supurge = dec_weekly_supurge$seasonal[2:8]
predsarimax4 = sarimax_prediction4$pred + last_trend_supurge + seasonality_supurge
predarima4 = arima_prediction4$pred + last_trend_supurge + seasonality_supurge
lastsupurge = data.table()
lastsupurge = lastsupurge[,actual:=supurge_test$sold_count]
lastsupurge = lastsupurge[,arimaprediction:=predarima4]
lastsupurge = lastsupurge[,sarimaxprediction:=predsarimax4]
## TAYT
pricetest_tayt = tayt_test$price
visitcounttest_tayt = tayt_test$visit_count
basketcounttest_tayt = tayt_test$basket_count
categorysoldtest_tayt = tayt_test$category_sold
categoryvisitstest_tayt = tayt_test$category_visits
xreg_new5 = cbind(pricetest_tayt,visitcounttest_tayt,basketcounttest_tayt,categorysoldtest_tayt,categoryvisitstest_tayt)
sarimax_prediction5 = predict(sarimax_tayt, newxreg = xreg_new5)
arima_prediction5 = predict(arima_tayt, newdata = tayt_test, n.ahead = 7)
last_trend_tayt = tail(dec_weekly_tayt$trend[!is.na(dec_weekly_tayt$trend)],1)
seasonality_tayt = dec_weekly_tayt$seasonal[2:8]
predsarimax5 = sarimax_prediction5$pred + last_trend_tayt + seasonality_tayt
predarima5 = arima_prediction5$pred + last_trend_tayt + seasonality_tayt
lasttayt = data.table()
lasttayt = lasttayt[,actual:=tayt_test$sold_count]
lasttayt = lasttayt[,arimaprediction:=predarima5]
lasttayt = lasttayt[,sarimaxprediction:=predsarimax5]
## BİKİNİÜSTÜ 1
basketcounttest_bikini327 = bikini327_test$basket_count
xreg_new6 = basketcounttest_bikini327
sarimax_prediction6 = predict(sarimax_bikini327, newxreg = xreg_new6)
arima_prediction6 = predict(arima_bikini327, newdata = bikini327_test, n.ahead = 7)
last_trend_bikini327 = tail(dec_weekly_bikini327$trend[!is.na(dec_weekly_bikini327$trend)],1)
seasonality_bikini327 = dec_weekly_bikini327$seasonal[2:8]
predsarimax6 = sarimax_prediction6$pred + last_trend_bikini327 + seasonality_bikini327
predarima6 = arima_prediction6$pred + last_trend_bikini327 + seasonality_bikini327
lastbikini327 = data.table()
lastbikini327 = lastbikini327[,actual:=bikini327_test$sold_count]
lastbikini327 = lastbikini327[,arimaprediction:=predarima6]
lastbikini327 = lastbikini327[,sarimaxprediction:=predsarimax6]
## Diş Fırçası
visitcounttest_firca = dis_fircasi_test$visit_count
favoredcounttest_firca = dis_fircasi_test$favored_count
basketcounttest_firca = dis_fircasi_test$basket_count
xreg_new7 = cbind(visitcounttest_firca,favoredcounttest_firca,basketcounttest_firca)
sarimax_prediction7 = predict(sarimax_firca, newxreg = xreg_new7)
arima_prediction7 = predict(arima_disfircasi, newdata = dis_fircasi_test, n.ahead = 7)
last_trend_firca = tail(dec_weekly_disfircasi$trend[!is.na(dec_weekly_disfircasi$trend)],1)
seasonality_firca = dec_weekly_disfircasi$seasonal[2:8]
predsarimax7 = sarimax_prediction7$pred + last_trend_firca + seasonality_firca
predarima7 = arima_prediction7$pred + last_trend_firca + seasonality_firca
lastfirca = data.table()
lastfirca = lastfirca[,actual:=dis_fircasi_test$sold_count]
lastfirca = lastfirca[,arimaprediction:=predarima7]
lastfirca = lastfirca[,sarimaxprediction:=predsarimax7]
# Mont
basketcounttest_mont = mont_test$basket_count
xreg_new8 = basketcounttest_mont
sarimax_prediction8 = predict(sarimax_mont, newxreg = xreg_new8)
arima_prediction8 = predict(arima_mont, newdata = mont_test, n.ahead = 7)
last_trend_mont = tail(dec_weekly_mont$trend[!is.na(dec_weekly_mont$trend)],1)
seasonality_mont = dec_weekly_mont$seasonal[2:8]
predsarimax8 = sarimax_prediction8$pred + last_trend_mont + seasonality_mont
predarima8 = arima_prediction8$pred + last_trend_mont + seasonality_mont
lastmont = data.table()
lastmont = lastmont[,actual:=mont_test$sold_count]
lastmont = lastmont[,arimaprediction:=predarima8]
lastmont = lastmont[,sarimaxprediction:=predsarimax8]
## BİKİNİ2
basketcounttest_bikini733 = bikini733_test$basket_count
categorysoldtest_bikini733 = bikini733_test$category_sold
xreg_new9 = cbind(basketcounttest_bikini733,categorysoldtest_bikini733)
sarimax_prediction9 = predict(sarimax_bikini733, newxreg = xreg_new9)
arima_prediction9 = predict(arima_bikini733, newdata = bikini733_test, n.ahead = 7)
last_trend_bikini733 = tail(dec_weekly_bikini733$trend[!is.na(dec_weekly_bikini733$trend)],1)
seasonality_bikini733 = dec_weekly_bikini733$seasonal[2:8]
predsarimax9 = sarimax_prediction9$pred + last_trend_bikini733 + seasonality_bikini733
predarima9 = arima_prediction9$pred + last_trend_bikini733 + seasonality_bikini733
lastbikini733 = data.table()
lastbikini733 = lastbikini733[,actual:=bikini733_test$sold_count]
lastbikini733 = lastbikini733[,arimaprediction:=predarima9]
lastbikini733 = lastbikini733[,sarimaxprediction:=predsarimax9]
We will use MAPE as performance measure for predicted values. Again we will start from “yuz_temizleyici” data and calculate each products one by one.
“yuz_temizleyici”
MAPE_arima_temizleyici = 100 * sum(abs(lastyuztemizleyici$actual-lastyuztemizleyici$arimaprediction)/abs(lastyuztemizleyici$actual))/ 7
MAPE_arima_temizleyici #20.05795
## [1] 20.05795
MAPE_sarimax_temizleyici = 100 * sum(abs(lastyuztemizleyici$actual-lastyuztemizleyici$sarimaxprediction)/abs(lastyuztemizleyici$actual)) / 7
MAPE_sarimax_temizleyici #15.13896
## [1] 15.13896
Regressors in our arima model for data “yuz_temizleyici” increased performance of model approximately 5 percent from 20% MAPE to 15% MAPE.
“islak_mendil”
MAPE_arima_mendil = 100 * sum(abs(lastmendil$actual-lastmendil$arimaprediction)/abs(lastmendil$actual)) / 7
MAPE_arima_mendil #23.41709
## [1] 23.41709
MAPE_sarimax_mendil = 100 * sum(abs(lastmendil$actual-lastmendil$sarimaxprediction)/abs(lastmendil$actual)) / 7
MAPE_sarimax_mendil #560.7457
## [1] 560.7457
In this case our MAPE for arima model forecasts is 23.42% however there are some interesting values in sarimaxprediction the last 3 day prediction of this is so meaningless. It gives values such as 8000, 16500. We could not find the mistake most probably we cannot see a little mistake. So our predictions with only arima model gives better performance.
“kulaklik”
MAPE_arima_kulaklik = 100 * sum(abs(lastkulaklik$actual-lastkulaklik$arimaprediction)/abs(lastkulaklik$actual)) / 7
MAPE_arima_kulaklik #18.82598
## [1] 18.82598
MAPE_sarimax_kulaklik = 100 * sum(abs(lastkulaklik$actual-lastkulaklik$sarimaxprediction)/abs(lastkulaklik$actual)) / 7
MAPE_sarimax_kulaklik #199.0838
## [1] 199.0838
In this case our MAPE for arima model is 18.83%. Sarimax prediction’s mape value is 199.08% again it is too high this may be because of the number of regressors we used. We used 4 regressors for both “islak_mendil” and “kulaklik”. Making prediction with too many new_xreg parameter may be the reason behind these meaningless mape values in our opinion.
“supurge”
MAPE_arima_supurge = 100 * sum(abs(lastsupurge$actual-lastsupurge$arimaprediction)/abs(lastsupurge$actual)) / 7
MAPE_arima_supurge #51.41471
## [1] 51.41471
MAPE_sarimax_supurge = 100 * sum(abs(lastsupurge$actual-lastsupurge$sarimaxprediction)/abs(lastsupurge$actual)) / 7
MAPE_sarimax_supurge #2822.228
## [1] 2822.228
Again the reason behind 2822.23% MAPE may be 4 regressors we used in new_xreg in predict function.
“tayt”
MAPE_arima_tayt = 100 * sum(abs(lasttayt$actual-lasttayt$arimaprediction)/abs(lasttayt$actual)) / 7
MAPE_arima_tayt #78.91206
## [1] 78.91206
MAPE_sarimax_tayt = 100 * sum(abs(lasttayt$actual-lasttayt$sarimaxprediction)/abs(lasttayt$actual)) / 7
MAPE_sarimax_tayt #172.2953
## [1] 172.2953
The arima model without regressors give better performance. Again we used too many new_xreg in predict function for “tayt” data.
“bikini327”
MAPE_arima_bikini327 = 100 * sum(abs(lastbikini327$actual-lastbikini327$arimaprediction)/abs(lastbikini327$actual)) / 7
MAPE_arima_bikini327 #12.93699
## [1] 12.93699
MAPE_sarimax_bikini327 = 100 * sum(abs(lastbikini327$actual-lastbikini327$sarimaxprediction)/abs(lastbikini327$actual)) / 7
MAPE_sarimax_bikini327 #15.70526
## [1] 15.70526
This time we have meaningful value for sarimaxprediction MAPE. We used only 1 regressor for the model. Our arima model gives better performance for prediction about sold_count for “bikini327” model.
“dis_fircasi”
MAPE_arima_firca = 100 * sum(abs(lastfirca$actual-lastfirca$arimaprediction)/abs(lastfirca$actual)) / 7
MAPE_arima_firca #26.5474
## [1] 26.5474
MAPE_sarimax_firca = 100 * sum(abs(lastfirca$actual-lastfirca$sarimaxprediction)/abs(lastfirca$actual)) / 7
MAPE_sarimax_firca #19.52496
## [1] 19.52496
Our sarimaxprediction gives better performance which is forecasted by an arima model with regressors. In this case we have 3 regressors. If new_xreg parameter has 4 or more regressors our predict function starts to give meaningless predictions.
“mont”
MAPE_arima_mont = 100 * sum(abs(lastmont$actual-lastmont$arimaprediction)/abs(lastmont$actual)) / 7
MAPE_arima_mont #43.95341
## [1] 43.95341
MAPE_sarimax_mont = 100 * sum(abs(lastmont$actual-lastmont$sarimaxprediction)/abs(lastmont$actual)) / 7
MAPE_sarimax_mont #43.09254
## [1] 43.09254
Both models almost have same performance in this case. Since our “mont” data has too many zeros in it this high and close mape values are normal. If we would have a good data we would make better predictions and would compare these models.
“bikini733”
MAPE_arima_bikini733 = 100 * sum(abs(lastbikini733$actual-lastbikini733$arimaprediction)/abs(lastbikini733$actual)) / 7
MAPE_arima_bikini733 #13.99726
## [1] 13.99726
MAPE_sarimax_bikini733 = 100 * sum(abs(lastbikini733$actual-lastbikini733$sarimaxprediction)/abs(lastbikini733$actual)) / 7
MAPE_sarimax_bikini733 #14.13329
## [1] 14.13329
Again this data also is not a good data because of zeros. However our mape values can be considered good but comparing them may not be meaningful. arima model without regressors gives slightly better performance. A good data without zeros would be nice for comparing performances of these 2 models.
We tried some other models for forecasting number of sold_count in our data which gives daily sales amount of Trendyol for 9 different products. We tried some arima models and then some regression models. We combined them at the end and prepared arima models with regressors. Adding more than 3 regressors to arima models gave us meaningless predicted values however models with 1,2 and 3 regressors gave better performance than arima models without regressors. One other problem were too many zeros in some of data which blocked us to make comments on their models and predictions.